[Draft] Async-based parallelism #1183

hudson-ai · 2025-04-22T16:49:02Z

Behavior changes:

lazy execution (lm += foo() always returns immediately)
execution triggered by stateful access, e.g. str(lm), lm[key], etc.

Introduces:

async stateful accessors, e.g. await lm.get_async(key) (API is still a WIP)
async guidance functions (i.e. @guidance decorator on async def functions
- allows usage of async accessors inside of guidance functions
- as well as other async apis, semaphores, etc.

Note: async accessors are fully compatible with non-async guidance functions (even stateful ones). I.e. you don't have to rewrite your existing guidance functions as async to get the concurrency benefits of async accessors farther up the stack.

Here's an example usage:

The main logic is encapsulated in a normal (non-async) @guidance function extract_image_data -- it does not need to be aware that its callers may be async!
An async @guidance function get_and_describe_image that uses external async functions, namely the get method of an httpx.AsyncClient.
- Note that while async accessors are perfectly valid, non-async accessors on the Model object (lm) are disallowed inside of async @guidance functions and will raise an exception. We could probably "fix" this, but it's honestly kind of a nice safeguard against shooting ourselves in the foot.
An async main function that gathers some number of coroutines returned by an async accessor on each of our unevaluated guidance programs.

import httpx
import asyncio
from guidance import *

@guidance
def extract_image_data(lm, image_bytes):
    with user():
        lm += "What is in this image?"
        lm += image(image_bytes)
    with assistant():
        lm += json(
            schema = {
                "type": "object",
                "properties": {
                    "description": {"type": "string"},
                    "colors": {"type": "array", "items": {"type": "string"}},
                    "objects": {"type": "array", "items": {"type": "string"}},
                },
                "required": ["description", "colors", "objects"],
                "additionalProperties": False
            },
            name = "data",
        )
    return lm

@guidance
async def get_and_describe_image(lm, client):
    resp = await client.get("https://picsum.photos/200")
    resp.raise_for_status()
    image_bytes = resp.content
    lm += extract_image_data(image_bytes)
    return lm

async def main():
    lm = models.OpenAI("gpt-4o-mini", echo=False)
    async with httpx.AsyncClient(follow_redirects=True) as client:
        lms = [
            lm + get_and_describe_image(client)
            for _ in range(10)
        ]
        datas = await asyncio.gather(*[lm.get_async("data") for lm in lms])
    return datas

@guidance functions can also be naively parallelized (regardless of whether or not they are async) via the batched entrypoints:

lms = lm.run_batched([func_1(), ... func_n()])

lms = await lm.async_run_batched([func_1(), ... func_n()])

Note that these entrypoints actually run the functions and are not lazy like += is.

TODOs:

fix stateful capture blocks
put token_count on state
how to trigger streams?
add example usage to this PR
fix and un-comment calls to vis/renderer
stabilize async accessor api
make a decision about the "ambiguous forking" problem
do some profiling experiments to ensure we're not introducing unnecessary overhead (e.g. compare to manual thread-based parallelism)
documentation

…ide of async

…c guidance functions

…ts in bg event loop

riedgar-ms · 2025-04-23T18:23:33Z

So the synchronous versions just do a Task.run() (or whatever it is)? Presumably that spins up a short-lived event loop.... I'm guessing we're not concerned about performance implications on that?

hudson-ai · 2025-04-23T18:48:14Z

So the synchronous versions just do a Task.run() (or whatever it is)? Presumably that spins up a short-lived event loop.... I'm guessing we're not concerned about performance implications on that?

We're maintaining a single long-lived event loop in a daemon thread (which has its own implications I suppose), so we just submit the coroutine and block the main thread until it's ready.

The nice thing is that this is only happening at the very top-level entry point, so we don't need multiple threads or anything like that to support recursive calls. Getting that working without deadlocks was an an interesting exercise -- more than happy to look at that code together!

…cessor)

codecov-commenter · 2025-04-24T19:56:46Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 68.28087% with 131 lines in your changes missing coverage. Please review.

Project coverage is 55.73%. Comparing base (3918b36) to head (9202113).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
guidance/_ast.py	68.37%	37 Missing ⚠️
guidance/models/_base/_model.py	77.86%	27 Missing ⚠️
guidance/models/experimental/_vllm.py	28.57%	20 Missing ⚠️
guidance/_reentrant_async.py	68.29%	13 Missing ⚠️
guidance/models/_openai_base.py	58.33%	10 Missing ⚠️
guidance/models/_azureai.py	0.00%	9 Missing ⚠️
guidance/models/_base/_interpreter.py	88.88%	4 Missing ⚠️
guidance/models/_base/_state.py	50.00%	4 Missing ⚠️
guidance/_guidance.py	50.00%	3 Missing ⚠️
guidance/library/_gen.py	0.00%	2 Missing ⚠️
... and 1 more

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1183       +/-   ##
===========================================
+ Coverage   40.63%   55.73%   +15.10%     
===========================================
  Files          62       63        +1     
  Lines        4782     4972      +190     
===========================================
+ Hits         1943     2771      +828     
+ Misses       2839     2201      -638

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

guidance/_reentrant_async.py

guidance/models/_base/__init__.py

guidance/models/_base/_model.py

guidance/models/_openai.py

Harsha-Nori · 2025-04-25T16:57:54Z

Thanks for the push here @hudson-ai -- fantastic work, really.

I've always been a huge fan of Jax's async dispatch model, and want to better understand why benefit 2 will no longer apply in an async dispatch world. Can't we keep e.g. a debounce buffer that batches objects as much as we can, thereby getting most of the benefit anyway?

There might be a solution here that replaces lazy eval with non-blocking eager eval, akin to jax's async dispatch, where we could introduce a block_until_ready method (or just its async counterpart). But I am hesitant, mainly due to lazy execution's benefit no. 2 above.

hudson-ai · 2025-04-25T23:09:42Z

Thanks for the push here @hudson-ai -- fantastic work, really.

I've always been a huge fan of Jax's async dispatch model, and want to better understand why benefit 2 will no longer apply in an async dispatch world. Can't we keep e.g. a debounce buffer that batches objects as much as we can, thereby getting most of the benefit anyway?

There might be a solution here that replaces lazy eval with non-blocking eager eval, akin to jax's async dispatch, where we could introduce a block_until_ready method (or just its async counterpart). But I am hesitant, mainly due to lazy execution's benefit no. 2 above.

Thanks @Harsha-Nori! And I appreciate the input / question. I don't honestly know the answer -- maybe some kind of buffering would work. Just going to think out loud a bit...

Let's say we have a chain of lm objects:

lm_1 = lm + foo(name="foo")
lm_2 = lm_1 + bar(name="bar") 
lm_3 = lm_2 + baz(name="baz")

With lazy execution as it's implemented in this PR, nothing gets executed until we do something like lm_3["bar"], at which point, we run the chain foo(...) + bar(...) + baz(...). If we try to access an earlier one, e.g. lm_2["bar"], we have to run the chain foo(... ) + bar(...), and we may get a different answer.

I'm imagining that if we did async dispatch + eager execution (no buffering), each of lm_1, lm_2, and lm_3 would essentially have a Future under the hood, with the bar part of lm_2 being unable to execute until the foo part of lm_1 does, etc.

With debounce-style buffering, we could track parent-child relationships, and noting that lm_1 and lm_2 both have children, we wouldn't run anything for them at all, only computing lm_3's foo(... ) + bar(...) + baz(...). But we'd then have to somehow back-fill lm_1 and lm_2, e.g. in case someone tries to access lm_1["foo"].

This doesn't seem too bad, but I think the story gets far more complicated once we start having branching calls / arbitrary DAGs.

E.g.

for _ in range(100):
   lm += qux()

lm_1 = lm
for _ in range(100):
   lm_1 += foo()

lm2 = lm
for _ in range(100):
  lm_2 += bar()

lm_1 and lm_2 share a common ancestor, namely lm with its 100 quxes. What if both of them start trying to compute their chains (qux() + ... + qux() + foo() + ... + foo() and qux() + ... + qux() + bar() + ... + bar(), respectively? Do they have to compete to acquire a lock on lm to make sure only one value gets computed for qux() + ... + qux()? If so, that means we can't parallelize the foos and the bars. For non-trivial DAGs, this means we probably miss a ton of speedup opportunities for things that should be embarassingly parallel.

If we can figure out the right way to do this "back-filling", I kind of like the idea. But it's also a bit spooky... Thoughts?

hudson-ai · 2025-04-25T23:14:51Z

Some kind of lm.run() is a lot less magic and in a lot of ways, a lot more cumbersome (e.g. having to call run before every getitem, lest an exception...). But it's another approach to remove ambiguities and keep everything immutable.

@nopdive I know you're a fan of async dispatch. Any thoughs on your end?

hudson-ai · 2025-05-07T01:39:05Z

Notes / status update for anyone watching this --

Everything works, but the sticky points are still surrounding API
I'm currently working on "backfilling" discussed above in order to get rid of the ambiguity that comes with "forking". @nopdive and I outlined a version of that together that I think has acceptable ergonomics.
I'm leaning towards eliminating the async_get function and its siblings in favor of something that feels more like async dispatch, i.e. await lm.block_until_ready() or something of the sort. But deciding this can wait until the backfilling stuff is done

hudson-ai added 30 commits April 8, 2025 16:39

pending ast

a980e0a

call _run before accessing state

1cd3ab1

poc of async model

07de0c3

put everything on top of async backend

eed1e59

make everything way more lazy

fbbe722

AsyncFunction

2df1c39

subtle -- blocks need to be applied before running

55ef91e

make sure that async + sync function application doesn't run sync ins…

636e2ba

…ide of async

centralize run_async_maybe_in_thread logic

032ef61

make engine interpreter async (well, as close as we can get right now)

3f3ac5d

fix engine interpreter role start/end

d7d46ee

change test_text_closer

d171106

add some tests

dea8020

fix sync/async eval loop (recall -- async can be inside of sync)

7b3aae6

fix test

7db5363

make our async re-entrant with greenlets

bb931f3

doc

be03577

fix wrong comment and use higher-level asyncio run

763e10b

don't try to await_ in a thread

63bb655

close the coro if we're not going to run it

ffa5c6d

black, isort, mypy

ba547f6

assert we actually get an exception when using sync accessors in asyn…

427f714

…c guidance functions

copy context vars into greenlet

394d415

more tests

70ee3f9

entry point decorator to help determine whether to run reentrant awai…

850ebd1

…ts in bg event loop

factor out run in bg thread

61c28ee

clean up bridge a bit

7698c26

make bg_async generic

ce3d7a0

move run in bg async to bridge

4839742

guidance/_bridge.py -> guidance/_reentrant_async.py

4d8d3ee

hudson-ai added 2 commits April 24, 2025 12:44

make ModelStream play nicely with lazy Model

b7650b8

call str to trigger execution for tests (todo: hide metrics behind ac…

9e12ff7

…cessor)

nking-1 reviewed Apr 24, 2025

View reviewed changes

guidance/_reentrant_async.py Outdated Show resolved Hide resolved

nking-1 reviewed Apr 24, 2025

View reviewed changes

guidance/models/_base/__init__.py Outdated Show resolved Hide resolved

hudson-ai added 2 commits April 24, 2025 15:09

Fix attribution to greenletio

439355f

Remove double import

f280ad9

hudson-ai commented Apr 24, 2025

View reviewed changes

guidance/models/_base/_model.py Outdated Show resolved Hide resolved

hudson-ai commented Apr 25, 2025

View reviewed changes

guidance/models/_openai.py Outdated Show resolved Hide resolved

hudson-ai added 12 commits May 6, 2025 10:46

refactor before merge

6edf276

Merge branch 'main' into greenlet_eval

cb8c613

async openai

d4c8b0e

clean up openai a bit

4ff3d3d

fix openai with Concatenate ast

4125e01

bring vllm up to speed

5a649ed

Merge branch 'main' into greenlet_eval

31b0fc8

regain my sanity -- refactor Model __init__ with dataclass

1650092

restore vis support

3a29a83

generalize interpreter run to yield InputAttr in addition to OutputAttr

4bd5c20

fix mixin order with openai

14af49c

fix openai audio gen

62950b4

hudson-ai added 2 commits May 7, 2025 11:41

Merge branch 'main' into greenlet_eval

716014c

Merge branch 'main' into greenlet_eval

9202113

hudson-ai marked this pull request as draft July 25, 2025 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Draft] Async-based parallelism #1183

[Draft] Async-based parallelism #1183

Uh oh!

hudson-ai commented Apr 22, 2025 •

edited

Loading

Uh oh!

riedgar-ms commented Apr 23, 2025

Uh oh!

hudson-ai commented Apr 23, 2025

Uh oh!

codecov-commenter commented Apr 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Harsha-Nori commented Apr 25, 2025

Uh oh!

hudson-ai commented Apr 25, 2025

Uh oh!

hudson-ai commented Apr 25, 2025

Uh oh!

hudson-ai commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Draft] Async-based parallelism #1183

Are you sure you want to change the base?

[Draft] Async-based parallelism #1183

Uh oh!

Conversation

hudson-ai commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

riedgar-ms commented Apr 23, 2025

Uh oh!

hudson-ai commented Apr 23, 2025

Uh oh!

codecov-commenter commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Harsha-Nori commented Apr 25, 2025

Uh oh!

hudson-ai commented Apr 25, 2025

Uh oh!

hudson-ai commented Apr 25, 2025

Uh oh!

hudson-ai commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hudson-ai commented Apr 22, 2025 •

edited

Loading

codecov-commenter commented Apr 24, 2025 •

edited

Loading